Imputation And Classification Of Missing Data Using Least Square Support Vector Machines – A New Approach In Dementia Diagnosis

نویسندگان

  • T. R. Sivapriya
  • V. Thavavel
چکیده

This paper presents a comparison of different data imputation approaches used in filling missing data and proposes a combined approach to estimate accurately missing attribute values in a patient database. The present study suggests a more robust technique that is likely to supply a value closer to the one that is missing for effective classification and diagnosis. Initially data is clustered and z-score method is used to select possible values of an instance with missing attribute values. Then multiple imputation method using LSSVM (Least Squares Support Vector Machine) is applied to select the most appropriate values for the missing attributes. Five imputed datasets have been used to demonstrate the performance of the proposed method. Experimental results show that our method outperforms conventional methods of multiple imputation and mean substitution. Moreover, the proposed method CZLSSVM (Clustered Z-score Least Square Support Vector Machine) has been evaluated in two classification problems for incomplete data. The efficacy of the imputation methods have been evaluated using LSSVM classifier. Experimental results indicate that accuracy of the classification is increases with CZLSSVM in the case of missing attribute value estimation. It is found that CZLSSVM outperforms other data imputation approaches like decision tree, rough sets and artificial neural networks, K-NN (KNearest Neighbour) and SVM. Further it is observed that CZLSSVM yields 95 per cent accuracy and prediction capability than other methods included and tested in the study. KeywordsLease Square Support Vector Machine; z-score; Classification; KNN; Support Vector Machine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

A QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES

Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only  considers both accuracy and generalization criteria in a single objective fu...

متن کامل

A classifier ensemble approach for the missing feature problem

OBJECTIVES Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets. MATERIALS AND METHODS Several state-of-the-art approaches are compared using...

متن کامل

Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir

The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012